Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

نویسنده

  • Shuheng Zhou
چکیده

Given n noisy samples with p dimensions, where n ≪ p, we show that the multistep thresholding procedure can accurately estimate a sparse vector β ∈ R in a linear model, under the restricted eigenvalue conditions (Bickel-Ritov-Tsybakov 09). Thus our conditions for model selection consistency are considerably weaker than what has been achieved in previous works. More importantly, this method allows very significant values of s, which is the number of non-zero elements in the true parameter. For example, it works for cases where the ordinary Lasso would have failed. Finally, we show that if X obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, the Gauss-Dantzig selector (CandèsTao 07) achieves the l2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle which would supply perfect information about which coordinates are non-zero and which are above the noise level, while selecting a sufficiently sparse model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some Two-Step Procedures for Variable Selection in High-Dimensional Linear Regression

We study the problem of high-dimensional variable selection via some two-step procedures. First we show that given some good initial estimator which is l∞-consistent but not necessarily variable selection consistent, we can apply the nonnegative Garrote, adaptive Lasso or hard-thresholding procedure to obtain a final estimator that is both estimation and variable selection consistent. Unlike th...

متن کامل

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing pena...

متن کامل

Variable Selection via Penalized Likelihood

Variable selection is vital to statistical data analyses. Many of procedures in use are ad hoc stepwise selection procedures, which are computationally expensive and ignore stochastic errors in the variable selection process of previous steps. An automatic and simultaneous variable selection procedure can be obtained by using a penalized likelihood method. In traditional linear models, the best...

متن کامل

Scalable kernel-based variable selection with sparsistency

Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of...

متن کامل

Statistical estimation with model selection

The purpose of this paper is to explain the interest and importance of (approximate) models and model selection in Statistics. Starting from the very elementary example of histograms we present a general notion of finite dimensional model for statistical estimation and we explain what type of risk bounds can be expected from the use of one such model. We then give the performance of suitable mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009